Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Under-sampling method based on sample density peaks for imbalanced data

SU Junning, YE Dongyi

Journal of Computer Applications 2020, 40 (1): 83-89. DOI: 10.11772/j.issn.1001-9081.2019060962

Abstract （395）

PDF （1034KB）（342）

Save

Imbalanced data classification is an important problem in data mining and machine learning. The way of re-sampling of data is crucial to the accuracy of classification. Concerning the problem that the existing under-sampling methods for imbalanced data cannot keep the distribution of sampling samples in good agreement with that of original samples, an under-sampling method based on sample density peaks was proposed. Firstly, the density peak clustering algorithm was applied to cluster samples of majority class and to estimate the central and boundary regions of different clusters obtained, so that each sample weight was determined according to the local density and different density peak distribution of cluster region where the sample was in. Then, the samples of majority class were under-sampled based on weights, so that the population of extracted majority class samples was gradually reduced from central region to boundary region of its cluster. In this way, the extracted samples would well reflect original sample distribution while suppressing the noise. Finally, a balanced data set was constructed by the sampled majority samples and all minority samples for the classifier training. The experimental results on multiple datasets show that the proposed sampling method has the F1-measure and G-mean improved, compared with some existing methods such as RBBag (Roughly Balanced Bagging), uNBBag (under-sampling NeighBorhood Bagging), KAcBag (K-means AdaCost bagging), proving that the proposed method is an effective and feasible sampling method.

Reference | Related Articles | Metrics